From segmental synthesis to acoustic rules using time dependent modeling techniques

نویسندگان

  • Gérard Chollet
  • Gunnar Ahlbom
  • Frédéric Bimbot
  • Alvaro De Lima-Veiga
چکیده

Intelligible Text-to-Speech may be achieved by concatenating spectrally encoded segments. However, its Iack of naturalness could be attributed to a difficult control of speech parameters. Acoustic rules are more adequate for this control. The aim of this work is to provide a methodology to move from a segmental to a rule-based approach. A number of interactive tools is proposed using powerful signaland data analysis techniques for modeling spectral evolution, inferring spectral targets, and generating adequate transitions between these targets. The choice of adequate spectral parameters is essential. A set of French speech segments ("polysons ") of a single speaker has been encoded using these tools. Spectral targets were constrained to belong to a finite set of vectors (allophonic targets). Coarticulation effects (vowel reduction, nasalisation ... ) can be accounted for by controlling the time duration of temporal evolution functions. Segment. concatenation problems are eliminated. Automatie procedures to select allophonic targets for new speakers and group temporal pattems into rules are the current issues. UNLIMITED VOCABULARY SPEECH SYNTHESIS Two main approaches have been proposed for unlimited vocabulary speech synthesis. The segmental approach (using diphones, demi-syllables, "polysons", ... ) offers an easy way to intelligible speech. But the segment inventory is speaker dependent and control of timing is a non trivial task. Its lack of naturalness can be attributed to uneasy analytic control of speech parameters. A rule-based approach is more flexible, gives more insight on the relevant features of speech, and may allow speaker modification. Control of prosody, style of speech, sentence rythm, is achieved quite naturally within a unified framework. Unfortunately, this approach requires human knowledge and manual intervention, for visual and auditory hand-tuning of the rules. The time-rlependent modeling techniques, described below, permit a structural description of spectral evolution in speech segments. From their results, rules can then be obtained by inferring spectral targets and extracting typical transitions patterns between these targets. MODELING TEMPORAL EVOLUTION Speech can be encoded using a sequence of p-dimensional "spectral" vectors y(t), corresponding to a time sampling of the vocal tract transfer function. A synthesizer could, either retrieve from memory an appropriate sequence, or generate this sequence by rules. The rules should predict the vector y(t) for every time instant t. In order to infer these rules, two analysis techniques have been experimented with: vectorial AR models and temporal decomposition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Techniques for accurate automatic annotation of speech waveforms

We describe techniques used in the development of an automatic annotation system for use with a concatenative text-to-speech synthesis system. The goal of the system is to generate automatically from word-level transcriptions annotations that result in synthetic speech of the same quality as that produced from hand-labelled speech. Our approach in this work has been to use the standard techniqu...

متن کامل

Modeling Techniques for Virtual Acoustics

Author Lauri Savioja Title Modeling Techniques for Virtual Acoustics The goal of this research has been the creation of convincing virtual acoustic environments. This consists of three separate modeling tasks: the modeling of the sound source, the room acoustics, and the listener. In this thesis the main emphasis is on room acoustics and sound synthesis. Room acoustic modeling techniques can be...

متن کامل

Non-segmental analysis and synthesis based on a speech database

This paper reports on experiments in non-segmental speech analysis and synthesis using parameters derived from a speech database of British English monosyllables. The database includes almost every onset, nucleus and coda, and almost all onset-nucleus and nucleus-consonant combinations occurring in English. Acoustic parameters including f0, formant frequencies and bandwidths, and amplitude of v...

متن کامل

Discrimination of Golab apple storage time using acoustic impulse response and LDA and QDA discriminant analysis techniques

ABSTRACT- Firmness is one of the most important quality indicators for apple fruits, which is highly correlated with the storage time. The acoustic impulse response technique is one of the most commonly used nondestructive detection methods for evaluating apple firmness. This paper presents a non-destructive method for classification of Iranian apple (Malus domestica Borkh. cv. Golab) according...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987